AITopics | regularization effect

Collaborating Authors

regularization effect

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e6af401c28c1790eaef7d55c92ab6ab6-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 16:11:40 GMT

algorithm 1, arxiv preprint arxiv, minimizer, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

a9078e8653368c9c291ae2f8b74012e7-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 17:55:22 GMT

experiment, latent dimension, linear layer, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

82ba9d6eee3f026be339bb287651c3d8-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 14:35:32 GMT

There exists a stream of papers in the literature on the regularization effect of Dropout for linear models[Wageretal.,2013,MianjyandArora,2019,HelmboldandLong,2015,Cavazzaetal.,2017].

artificial intelligence, machine learning, regularization, (19 more...)

Neural Information Processing Systems

Country: Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2288f691b58edecadcc9a8691762b4fd-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 19:24:28 GMT

distillation, knowledge distillation, neural network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2288f691b58edecadcc9a8691762b4fd-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 19:24:17 GMT

neural network, regularization effect, sparsity, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

Neural Information Processing SystemsDec-25-2025, 22:31:21 GMT

Stochastic gradient descent with a large initial learning rate is widely used for training modern neural net architectures. Although a small initial learning rate allows for faster training and better test performance initially, the large learning rate achieves better generalization soon after the learning rate is annealed. Towards explaining this phenomenon, we devise a setting in which we can prove that a two layer network trained with large initial learning rate and annealing provably generalizes better than the same network trained with a small learning rate from the start. The key insight in our analysis is that the order of learning different types of patterns is crucial: because the small learning rate model first memorizes low-noise, hard-to-fit patterns, it generalizes worse on hard-to-generalize, easier-to-fit patterns than its large learning rate counterpart. This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing. Our experiments show that this causes the small learning rate model's accuracy on unmodified images to suffer, as it relies too much on the patch early on.

learning rate, name change, regularization effect, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.39)

Add feedback

Label Noise SGD Provably Prefers Flat Global Minimizers

Neural Information Processing SystemsDec-25-2025, 03:12:15 GMT

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to a stationary point of a regularized loss $L(\theta) +\lambda R(\theta)$, where $L(\theta)$ is the training loss, $\lambda$ is an effective regularization parameter depending on the step size, strength of the label noise, and the batch size, and $R(\theta)$ is an explicit regularizer that penalizes sharp minimizers. Our analysis uncovers an additional regularization effect of large learning rates beyond the linear scaling rule that penalizes large eigenvalues of the Hessian more than small ones. We also prove extensions to classification with general loss functions, significantly strengthening the prior work of Blanc et al. to global convergence and large learning rates and of HaoChen et al. to general models.

electronic proceedings, name change, provably prefer flat global minimizer, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Stochastic Normalization

Neural Information Processing SystemsDec-24-2025, 12:47:46 GMT

Fine-tuning pre-trained deep networks on a small dataset is an important component in the deep learning pipeline. A critical problem in fine-tuning is how to avoid over-fitting when data are limited. Existing efforts work from two aspects: (1) impose regularization on parameters or features; (2) transfer prior knowledge to fine-tuning by reusing pre-trained parameters. In this paper, we take an alternative approach by refactoring the widely used Batch Normalization (BN) module to mitigate over-fitting. We propose a two-branch design with one branch normalized by mini-batch statistics and the other branch normalized by moving statistics.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Self-Distillation Amplifies Regularization in Hilbert Space

Neural Information Processing SystemsOct-2-2025, 11:11:23 GMT

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data.

artificial intelligence, distillation, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback